41 research outputs found

    Time Series Cluster Kernel for Learning Similarities between Multivariate Time Series with Missing Data

    Get PDF
    Similarity-based approaches represent a promising direction for time series analysis. However, many such methods rely on parameter tuning, and some have shortcomings if the time series are multivariate (MTS), due to dependencies between attributes, or the time series contain missing data. In this paper, we address these challenges within the powerful context of kernel methods by proposing the robust \emph{time series cluster kernel} (TCK). The approach taken leverages the missing data handling properties of Gaussian mixture models (GMM) augmented with informative prior distributions. An ensemble learning approach is exploited to ensure robustness to parameters by combining the clustering results of many GMM to form the final kernel. We evaluate the TCK on synthetic and real data and compare to other state-of-the-art techniques. The experimental results demonstrate that the TCK is robust to parameter choices, provides competitive results for MTS without missing data and outstanding results for missing data.Comment: 23 pages, 6 figure

    Classification of postoperative surgical site infections from blood measurements with missing data using recurrent neural networks

    Full text link
    Clinical measurements that can be represented as time series constitute an important fraction of the electronic health records and are often both uncertain and incomplete. Recurrent neural networks are a special class of neural networks that are particularly suitable to process time series data but, in their original formulation, cannot explicitly deal with missing data. In this paper, we explore imputation strategies for handling missing values in classifiers based on recurrent neural network (RNN) and apply a recently proposed recurrent architecture, the Gated Recurrent Unit with Decay, specifically designed to handle missing data. We focus on the problem of detecting surgical site infection in patients by analyzing time series of their blood sample measurements and we compare the results obtained with different RNN-based classifiers

    Noisy multi-label semi-supervised dimensionality reduction

    Get PDF
    Noisy labeled data represent a rich source of information that often are easily accessible and cheap to obtain, but label noise might also have many negative consequences if not accounted for. How to fully utilize noisy labels has been studied extensively within the framework of standard supervised machine learning over a period of several decades. However, very little research has been conducted on solving the challenge posed by noisy labels in non-standard settings. This includes situations where only a fraction of the samples are labeled (semi-supervised) and each high-dimensional sample is associated with multiple labels. In this work, we present a novel semi-supervised and multi-label dimensionality reduction method that effectively utilizes information from both noisy multi-labels and unlabeled data. With the proposed Noisy multi-label semi-supervised dimensionality reduction (NMLSDR) method, the noisy multi-labels are denoised and unlabeled data are labeled simultaneously via a specially designed label propagation algorithm. NMLSDR then learns a projection matrix for reducing the dimensionality by maximizing the dependence between the enlarged and denoised multi-label space and the features in the projected space. Extensive experiments on synthetic data, benchmark datasets, as well as a real-world case study, demonstrate the effectiveness of the proposed algorithm and show that it outperforms state-of-the-art multi-label feature extraction algorithms.Comment: 38 page

    Time series cluster kernels to exploit informative missingness and incomplete label information

    Get PDF
    The time series cluster kernel (TCK) provides a powerful tool for analysing multivariate time series subject to missing data. TCK is designed using an ensemble learning approach in which Bayesian mixture models form the base models. Because of the Bayesian approach, TCK can naturally deal with missing values without resorting to imputation and the ensemble strategy ensures robustness to hyperparameters, making it particularly well suited for unsupervised learning. However, TCK assumes missing at random and that the underlying missingness mechanism is ignorable, i.e. uninformative, an assumption that does not hold in many real-world applications, such as e.g. medicine. To overcome this limitation, we present a kernel capable of exploiting the potentially rich information in the missing values and patterns, as well as the information from the observed data. In our approach, we create a representation of the missing pattern, which is incorporated into mixed mode mixture models in such a way that the information provided by the missing patterns is effectively exploited. Moreover, we also propose a semi-supervised kernel, capable of taking advantage of incomplete label information to learn more accurate similarities. Experiments on benchmark data, as well as a real-world case study of patients described by longitudinal electronic health record data who potentially suffer from hospital-acquired infections, demonstrate the effectiveness of the proposed method

    On the Use of Time Series Kernel and Dimensionality Reduction to Identify the Acquisition of Antimicrobial Multidrug Resistance in the Intensive Care Unit

    Get PDF
    Presentation at the 2021 KDD Workshop on Applied Data Science for Healthcare, 15.08.21 - 16.08.21. https://dshealthkdd.github.io/dshealth-2021/The acquisition of Antimicrobial Multidrug Resistance (AMR) in patients admitted to the Intensive Care Units (ICU) is a major global concern. This study analyses data in the form of multivariate time series (MTS) from 3476 patients recorded at the ICU of University Hospital of Fuenlabrada (Madrid) from 2004 to 2020. 18% of the patients acquired AMR during their stay in the ICU. The goal of this paper is an early prediction of the development of AMR. Towards that end, we leverage the time-series cluster kernel (TCK) to learn similarities between MTS. To evaluate the effectiveness of TCK as a kernel, we applied several dimensionality reduction techniques for visualization and classification tasks. The experimental results show that TCK allows identifying a group of patients that acquire the AMR during the first 48 hours of their ICU stay, and it also provides good classification capabilities

    On the Differential Analysis of Enterprise Valuation Methods as a Guideline for Unlisted Companies Assessment (I): Empowering Discounted Cash Flow Valuation

    Get PDF
    The Discounted Cash Flow (DCF) method is probably the most extended approach used in company valuation, its main drawbacks being probably the known extreme sensitivity to key variables such asWeighted Average Cost of Capital (WACC) and Free Cash Flow (FCF) estimations not unquestionably obtained. In this paper we propose an unbiased and systematic DCF method which allows us to value private equity by leveraging on stock markets evidences, based on a twofold approach: First, the use of the inverse method assesses the existence of a coherentWACC that positively compares with market observations; second, different FCF forecasting methods are benchmarked and shown to correspond with actual valuations. We use financial historical data including 42 companies in five sectors, extracted from Eikon-Reuters. Our results show that WACC and FCF forecasting are not coherent with market expectations along time, with sectors, or with market regions, when only historical and endogenous variables are taken into account. The best estimates are found when exogenous variables, operational normalization of input space, and data-driven linear techniques are considered (Root Mean Square Error of 6.51). Our method suggests that FCFs and their positive alignment with Market Capitalization and the subordinate enterprise value are the most influencing variables. The fine-tuning of the methods presented here, along with an exhaustive analysis using nonlinear machine-learning techniques, are developed and discussed in the companion paper

    Opening the 21st Century Technologies to Industries: On the Special Issue Machine Learning for Society

    Get PDF
    Machine learning techniques, more commonly known today as artificial intelligence, are playing an increasingly important role in all aspects of our lives. Their applications extend to all areas of society where similar techniques can be accommodated to provide efficient and interesting solutions to a wide range of problems. In this Special Issue entitled Machine Learning for Society [1], we present some examples of the applications of this type of technique. From the valuation of unlisted companies to the characterization of clients, through the detection of financial crises or the prediction of the behavior of the exchange rate, this group of works presented here has in common the search for efficient solutions based on a set of historical data, and the application of artificial intelligence techniques. The techniques and datasets used, as well as the relevant findings developed in the different articles of this Special Issue, are summarized below

    On the Differential Analysis of Enterprise Valuation Methods as a Guideline for Unlisted Companies Assessment (II): Applying Machine-Learning Techniques for Unbiased Enterprise Value Assessment

    Get PDF
    The search for an unbiased company valuation method to reduce uncertainty, whether or not it is automatic, has been a relevant topic in social sciences and business development for decades. Many methods have been described in the literature, but consensus has not been reached. In the companion paper we aimed to review the assessment capabilities of traditional company valuation model, based on company’s intrinsic value using the Discounted Cash Flow (DCF). In this paper, we capitalized on the potential of exogenous information combined with Machine Learning (ML) techniques. To do so, we performed an extensive analysis to evaluate the predictive capabilities with up to 18 different ML techniques. Endogenous variables (features) related to value creation (DCF) were proved to be crucial elements for the models, while the incorporation of exogenous, industry/country specific ones, incrementally improves the ML performance. Bagging Trees, Supported Vector Machine Regression, Gaussian Process Regression methods consistently provided the best results. We concluded that an unbiased model can be created based on endogenous and exogenous information to build a reference framework, to price and benchmark Enterprise Value for valuation and credit risk assessment
    corecore